{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 微调向量模型\n",
    "开源向量模型通常是在大规模通用语料库上进行预训练的，在单个特定领域中可能无法捕捉句子中的全部语义信息，我们将向量模型在小规模数据集上进行微调可以使模型更好地提取句子中的语义信息，进而提升RAG的检索能力。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 一、使用Sentence Transformers微调向量模型\n",
    "`Sentence Transformers`是基于`PyTorch`和`Transformers`的开源框架，在提供了大量句子、文本、图像的向量模型的同时也对向量模型的微调进行了很好的封装。本节中我们将使用此框架微调向量模型。\n",
    "### 1. 数据集类型以及对应损失函数\n",
    "我们在微调向量模型时需要读取数据集并根据数据集选择合适的损失函数，`Sentence Transformers`支持下方表格所示四种格式数据集以及对应数据集的损失函数：\n",
    "|数据集结构 | 损失函数 |\n",
    "| --- | --- |\n",
    "|句对样本|MultipleNegativesRankingLoss(多负例排名损失)<br>MegaBatchMarginLoss(超大批次间隔损失)|\n",
    "|带分数的句对样本|ContrastiveLoss(对比损失)<br>SoftmaxLoss<br>CosineSimilarityLoss(余弦相似度损失)|\n",
    "|三元组|TripletLoss(三元组损失)|\n",
    "|带有标签的单句|BatchAllTripletLoss(全批次处理三元组损失)<br>BatchHardTripletLoss(批处理硬三元组损失)<br>BatchHardSoftMarginTripletLoss（批处理软间隔硬三元组损失）<br>BatchSemiHardTripletLoss(批处理半硬三元组损失)|\n",
    "\n",
    "接下来我们对每种数据集及其对应的损失函数进行介绍。\n",
    "\n",
    "> 对损失函数不太了解的伙伴可以先看下本章附损失函数浅析，为学习接下来内容做下铺垫。\n",
    "#### 1.1句对样本\n",
    "相似的句子对，比如问题-答案、全文-摘要以及意思相同的问题"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'sequence': ['Datawhale是什么?', 'Datawhale是一个专注于数据科学与AI领域的开源组织。']}\n"
     ]
    }
   ],
   "source": [
    "from datasets import Dataset\n",
    "sequence = [['Datawhale是什么?','Datawhale是一个专注于数据科学与AI领域的开源组织。'], ['llm_universe是什么?','llm_universe是一个面向小白开发者的大模型应用开发教程。']]\n",
    "\n",
    "ds = Dataset.from_dict({'sequence': sequence})\n",
    "print(ds[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "句对样本类型的数据在微调向量模型时支持的格式为`Texts: (anchor, positive) pairs`(anchor为第一句为锚定句、参考句，与第二句positive是相似的一组正例)，常用的损失函数有两种：\n",
    "1. `MultipleNegativesRankingLoss(多负例排名损失)`：该损失函数会把大小为n的批次数据中当前句对样本作为正例，把当前句对样本的`anchor`与当前句对样本外其他样本的`positive`作为反例，通过余弦相似度等相似函数获得它们之间的相似程度，最后将得分与能够指示正例与负例关系的标签传入交叉熵损失函数计算并得到损失。因此该损失函数的性能随着批次的增加而提高。\n",
    "$$L=-\\frac{1}{n}\\sum_{i=1}^{n}\\bigg[S(x_i,y_i)-log\\sum_{j=1}^{n}e^{S(x_i,y_j)}\\bigg]$$\n",
    "其中$n$为批次大小，$x_i$、$y_i$是第$i$个`anchor`与`positive`，S表示将输入的两个句子向量化并计算相似度。实例化时需要提供`model`一个参数，默认的相似函数为余弦相似度，如果使用点积相似度的话需要额外设置`scale=1, similarity_fct=util.dot_score`。\n",
    "\n",
    "2. `MegaBatchMarginLoss(超大批次间隔损失)`：该损失函数会为批次中每个句对样本的`anchor`从该批次其他句对样本中的`positive`寻找余弦相似度最低的句子作为该句对样本的`negative`，使用该损失函数时建议批次大小为500及以上，最后得到的损失为:\n",
    "$$L=\\frac{1}{n}\\sum_{i=1}^{n}[max(0,positive \\, margin-cos \\, sim(\\vec{anchor},\\vec{positive}))+max(0,cos \\, sim(\\vec{anchor},\\vec{negative})-negative \\, margin)]$$\n",
    "其中$n$为批次大小，$positive \\, margin$为期待微调后`anchor`与`positive`的余弦相似度，$negative \\, margin$为期待微调后`anchor`与`negative`的余弦相似度。\n",
    "实例化该损失函数时，需要提供`model`一个参数，当然我们也可以指定`positive_margin`与`negative_margin`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.2 带分数的句对样本\n",
    "一对句子以及描述他们之间相似程度的标签"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'premise': 'Datawhale是一个专注于数据科学与AI领域的开源组织。',\n",
       " 'hypothesis': '我今天很开心！',\n",
       " 'label': 0}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "premise = ['Datawhale是一个专注于数据科学与AI领域的开源组织。', 'Datawhale是一个专注于数据科学与AI领域的开源组织。', 'Datawhale是一个专注于数据科学与AI领域的开源组织。']\n",
    "hypothesis = ['我今天很开心！','我参加了Datawhale的组队学习' , 'Datawhale是一个开源组织。']\n",
    "# 用0,1,2表示句子之间的相似程度, 分别为不相似、部分相似、相似\n",
    "label = [0, 1, 2]\n",
    "\n",
    "ds = Dataset.from_dict({'premise': premise, 'hypothesis': hypothesis, 'label': label})\n",
    "ds[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "该类型数据在微调向量模型时要将数据转换为`Sentence Transformers`支持的格式一般为`Texts: (sentence_A, sentence_B) pairs, Labels: class`，常用的损失函数有三种可以选用,但需要的数据格式却有细微的差别:\n",
    "1. `ContrastiveLoss`(对比损失)：该损失函数适用于数据只有两类的数据集，公式为：\n",
    "$$L=\\frac{1}{n}\\sum_{i=1}^{n}\\{(1-label_i)\\frac{1}{2}(cos \\, dis(\\vec{sent\\_A_i},\\vec{sent\\_B_i}))^2+(label_i)\\frac{1}{2}[max(0,margin-cos \\, dis(\\vec{sent\\_A_i},\\vec{sent\\_B_i}))]^2\\}$$\n",
    "\n",
    "其中$n$为批次大小，0为不相似，1为相似，$margin$为`label`值为0的句对样本中`sentence_A, sentence_B`的最小距离阈值，$cos \\, dis(\\vec{a},\\vec{b})$表示向量$\\vec{a}$与$\\vec{b}$的余弦距离。在实例化时至少需要`model`一个参数，除此之外我们还可以指定$margin$或者将余弦距离替换为曼哈顿距离、欧氏距离；\n",
    "\n",
    "2. `SoftmaxLoss`：该损失函数适用于数据标签为多类的数据集，原理如下图所示通过将sentence_A, sentence_B向量化后得到的u、v以及|u-v|拼接到一起传入Softmax分类器并得到该句对在各类别的概率，最后的损失由交叉熵损失函数计算得出。\n",
    "<div align=\"center\">\n",
    "  <img src=\"./figures/SBERT_SoftmaxLoss.png\" alt=\"SoftmaxLoss\" width=\"400\" />\n",
    "</div>\n",
    "\n",
    "该损失函数实例化时至少需要`model`需要微调的模型、`sentence_embedding_dimension`模型向量化句子输出的维度以及`num_labels`类别数量三个参数；\n",
    "\n",
    "3. `CosineSimilarityLoss`(余弦相似度损失)：该损失函数会计算sentence_A, sentence_B向量化后的余弦相似度，并将得到的余弦相似度与`label`值传入平均平方误差损失函数（MSE）得出。其中`Labels`为两个句子的相似度格式为0至1之间的浮点数，实例化时需要`model`一个参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3 三元组\n",
    "三元组格式的数据分别为锚定句（anchor）、与锚定句语义相近的正句（positive）以及与锚定句语义相反的反句（negative）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'anchor': '我该怎么学习RAG？', 'positive': '大家是怎么学习RAG的？', 'negative': 'RAG是什么？'}\n"
     ]
    }
   ],
   "source": [
    "anchor = ['我该怎么学习RAG？', '我要如何养花？']\n",
    "positive = ['大家是怎么学习RAG的？', '养花时要注意什么？']\n",
    "negative = ['RAG是什么？', '玫瑰花有多少种？']\n",
    "\n",
    "ds = Dataset.from_dict({'anchor': anchor, 'positive': positive, 'negative': negative})\n",
    "print(ds[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "该类型数据在微调向量模型时支持的格式为`Texts: (anchor, positive, negative) triplets`，常用的损失函数为`TripletLoss`(三元组损失)：\n",
    "$$L=max(||\\vec{anc}-\\vec{pos}||^2-||\\vec{anc}-\\vec{neg}||^2+margin,0)$$\n",
    "其中`margin`是正负距离之间的松弛间隔，$||\\vec{a}-\\vec{b}||^2$表示向量$\\vec{a}$与$\\vec{b}$之间的欧氏距离，在实例化时至少需要`model`一个参数，也可以自己设置松弛间隔`margin`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 带有标签的单句\n",
    "单个句子以及描述单句类别的标签（最少有两种类别且每个标签类最少两个示例）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'question': 'Datawhale是什么?', 'label': 0}\n"
     ]
    }
   ],
   "source": [
    "question = ['Datawhale是什么?', 'llm_universe是什么?', '我今天的早餐是豆浆。', '我好想吃火锅呀！']\n",
    "# 0:计算机、互联网,1:饮食\n",
    "label = [0, 0, 1, 1]\n",
    "\n",
    "ds = Dataset.from_dict({'question': question, 'label': label})\n",
    "print(ds[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "该类型数据在微调向量模型时支持的格式为`Texts: single sentences, Labels: class`,常用的损失函数有四种：\n",
    "1. `BatchAllTripletLoss`(全批次处理三元组损失)：该损失函数将批次单句生成所有可能的三元组，并使用`TripletLoss`计算损失，因此每个批次中数据至少有两类，每类至少有两个示例。在实例化时至少需要`model`一个参数，也可以自己设置松弛间隔`margin`。\n",
    "2. `BatchHardTripletLoss`(批处理硬三元组损失)：该损失函数只使用每个锚点对应的最远的`positive`与最近的`negative`作为三元组，在实例化时至少需要`model`一个参数，同样可以自己设置松弛间隔`margin`。\n",
    "3. `BatchHardSoftMarginTripletLoss`(批处理软间隔硬三元组损失)：与`BatchHardTripletLoss`大体相同只不过最后的损失使用对数函数与指数函数计算，所以松弛间隔是软的，不需要人为设置，在实例化时只需要给出`model`参数即可。\n",
    "4. `BatchSemiHardTripletLoss`(批处理半硬三元组损失)：同样生成三元组并使用`TripletLoss`计算损失，但生成条件需满足`dis(anchor, positive)<dis(anchor, negative) + margin`，在实例化时至少需要`model`一个参数，同样可以自己设置松弛间隔`margin`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.微调并评估向量模型\n",
    "大家了解数据集主要类型及损失函数之后我们开始微调向量模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/lta/anaconda3/envs/llm_universe_2.x/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# 加载模型\n",
    "from sentence_transformers import SentenceTransformer\n",
    "'''\n",
    "model_name_or_path(模型的名称或者路径):未加载时应为huggingface中的模型，若模型文件在本地则为模型路径\n",
    "device(设备):选择何种设备进行计算，若未指定则依次检测gpu、mps、npu、cpu\n",
    "cache_folder(缓存文件夹):模型文件下载地址，也可通过设置环境变量\"SENTENCE_TRANSFORMERS_HOME\"指定，默认为.cache文件夹\n",
    "trust_remote_code(信任远程代码):是否信任从远程下载的配置文件\n",
    "'''\n",
    "model = SentenceTransformer(model_name_or_path='BAAI/bge-small-zh-v1.5', device='mps', cache_folder='./bge_small', trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "最初加载med_dataset结构为:\n",
      "DatasetDict({\n",
      "    train: Dataset({\n",
      "        features: ['question1', 'question2', 'label'],\n",
      "        num_rows: 1000\n",
      "    })\n",
      "})\n",
      "med_dataset中训练集结构为:\n",
      "Dataset({\n",
      "    features: ['question1', 'question2', 'label'],\n",
      "    num_rows: 1000\n",
      "})\n",
      "med_dataset中标签类别为:\n",
      "{0, 1}\n",
      "med_dataset训练集中第一条数据为:\n",
      "{'question1': '艾滋患者能用可善挺吗？', 'question2': 'JUNCTURE研究', 'label': 0}\n"
     ]
    }
   ],
   "source": [
    "# 加载数据集\n",
    "from datasets import load_dataset\n",
    "\n",
    "dataset = load_dataset(path='vegaviazhang/Med_QQpairs', cache_dir='./medical')\n",
    "# 我们可以通过打印 dataset 来了解数据集结构\n",
    "print(f'最初加载med_dataset结构为:\\n{dataset}')\n",
    "print('med_dataset中训练集结构为:\\n{}'.format(dataset['train']))\n",
    "labels = set()\n",
    "for i in range(dataset['train'].num_rows): labels.add(dataset['train'][i]['label'])\n",
    "print(f'med_dataset中标签类别为:\\n{labels}')\n",
    "print('med_dataset训练集中第一条数据为:\\n{}'.format(dataset['train'][0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为`vegaviazhang/Med_QQpairs`数据集是一对句子与描述句子之间相似程度的形式，因此属于上述数据集中第一种情况，又因为标签类别有0、1两种所以我们使用`ContrastiveLoss`损失函数来微调模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7348008a87974d0dbcd4a40b9ee189a8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Epoch:   0%|          | 0/10 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7c9ddeaf9c8643d29214bf78f550917f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f376b4315a094a23a0f4d215a3916e34",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fd70c1ed6ead44a686437286942cc9c0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ae95ae9e548541dc9f0a1734578bc3a2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fbc1600f81b94409a7bf6b239f3742a4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0cd89463dd3b4bedbab044133e40d75a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "70b986fa52b544249c8748d69a4175bf",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f28b3795c32b430ca4ec78956a508bbc",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1f5ba372dcb44e5a8668a41bd2eaa2a9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b43a30b460a54efa9ad60e5dd6f2729a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Iteration:   0%|          | 0/25 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sentence_transformers import InputExample\n",
    "from torch.utils.data import DataLoader\n",
    "from sentence_transformers import losses\n",
    "from sentence_transformers.evaluation import BinaryClassificationEvaluator\n",
    "# 打乱数据集顺序\n",
    "dataset = dataset['train'].shuffle(seed=42)\n",
    "# 将数据集按0.8、0.2的比例分为训练集、验证集并转为实例化 ContrastiveLoss 要求的形式\n",
    "# 以数据集中第一条数据为例即 texts=['艾滋患者能用可善挺吗？', 'JUNCTURE研究'], label=0\n",
    "examples = []\n",
    "for i in range(dataset.num_rows):\n",
    "  example = dataset[i]\n",
    "  examples.append(InputExample(texts=[example['question1'], example['question2']], label=example['label']))\n",
    "\n",
    "train_examples = examples[:800]\n",
    "dev_examples = examples[800:]\n",
    "\n",
    "# 将数据集转换为DataLoader形式\n",
    "train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=32)\n",
    "\n",
    "# 实例化损失函数\n",
    "train_loss = losses.ContrastiveLoss(model=model)\n",
    "# 实例化评估器，将每次训练后的模型在验证集上测试性能\n",
    "evaluator = BinaryClassificationEvaluator.from_input_examples(dev_examples, name='med-dev')\n",
    "# 定义模型保存路径\n",
    "model_save_path='./medical_bge_small'\n",
    "# 微调模型\n",
    "model.fit([(train_dataloader, train_loss)],\n",
    "          evaluator=evaluator,\n",
    "          epochs=10,\n",
    "          output_path=model_save_path,\n",
    "          )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为某些原因无法正常显示，正常运行代码后进度条结果应显示为：\n",
    "![进度条](./figures/progress_bar.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来我们读取并绘制模型训练时在验证集上的评估成绩，因为`ContrastiveLoss`计算损失时是基于余弦相似度的所以我们选取与余弦相似度有关的`cossim_accuracy_threshold`数据进行绘制，该指标指的是句对的余弦相似度大于阈值的准确率。\n",
    "> 注：微调向量模型效果不一定提升，微调对数据集质量与数量都有一定要求，如果数据集相对于模型参数过小则会导致模型出现过拟合的现象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA2AAAAINCAYAAABYjxyUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAABV/klEQVR4nO3deXhU5d3/8c/MkGUSkrAlIYFAEDAsoiBLymKlNrKIKVR9lKUScEGKihALBiSoZUmRGvPUBbS/UrGAgrutitUoVhYBQVEfEsKisiYhomQj28z8/sAZTANIhmROZub9uq5cF3PmnjPfE8aQj/d9f4/J4XA4BAAAAABodGajCwAAAAAAf0EAAwAAAAAPIYABAAAAgIcQwAAAAADAQwhgAAAAAOAhBDAAAAAA8BACGAAAAAB4CAEMAAAAADykmdEFeCu73a6jR48qLCxMJpPJ6HIAAAAAGMThcKikpESxsbEym88/x0UAc9PRo0cVFxdndBkAAAAAmohDhw6pffv25x1DAHNTWFiYpNPf5PDwcIOrAQAAAGCU4uJixcXFuTLC+RDA3ORcdhgeHk4AAwAAAHBBW5NowgEAAAAAHkIAAwAAAAAPIYABAAAAgIewB6wRORwO1dTUyGazGV0KcFEsFouaNWvGLRcAAAAuEgGskVRVVenYsWMqLy83uhSgQYSEhCgmJkaBgYFGlwIAAOC1CGCNwG636+uvv5bFYlFsbKwCAwOZOYDXcjgcqqqq0vHjx/X111+ra9euP3uDQQAAAJwdAawRVFVVyW63Ky4uTiEhIUaXA1w0q9WqgIAAffvtt6qqqlJwcLDRJQEAAHgl/jd2I2KWAL6EzzMAAMDF4zcqAAAAAPAQAhgAAAAAeAgBDIYYOnSoZsyY4XocHx+vrKysRjv/+TT0e3uL5557Ti1atHA9fvjhh9W7d2/D6gEAAPAHNOFAk7B9+3aFhoYaXQYAAADQqAhgaBIiIyONLgEAAABodE1iCeJTTz2l+Ph4BQcHKzExUdu2bTvv+KysLCUkJMhqtSouLk4zZ85URUWF6/mSkhLNmDFDHTt2lNVq1aBBg7R9+/Za53A4HJo/f75iYmJktVqVlJSkvXv3Nsr1Od+vvKrG418Oh6NedQ4dOlT33nuvZsyYoZYtWyo6Olp//etfVVZWpsmTJyssLExdunTRO++843rNV199pZEjR6p58+aKjo7WrbfeqqKiItfzZWVlmjhxopo3b66YmBg99thjdd73v5cB/vDDD7rrrrsUHR2t4OBgXXbZZfrXv/4lSfruu+80btw4tWvXTiEhIerVq5deeOGFev6N1FZSUqJx48YpNDRU7dq101NPPVXr+dzcXA0ZMkTBwcHq0aOH3n//fZlMJr3++uuuMYcOHdLNN9+sFi1aqFWrVho9erS++eabC3r/DRs2aMCAAQoNDVWLFi00ePBgffvtt5LOLA1csWKFOnTooObNm2vatGmy2Wx69NFH1bZtW0VFRWnRokW1zpmZmalevXopNDRUcXFxmjZtmkpLSy/q+wQAAICLY/gM2Nq1a5Wamqrly5crMTFRWVlZGj58uPbs2aOoqKg649esWaO0tDStWLFCgwYNUl5eniZNmiSTyaTMzExJ0h133KGvvvpK//jHPxQbG6tVq1YpKSlJu3fvVrt27SRJjz76qP7yl79o5cqV6tSpk9LT0zV8+HDt3r27Ue5xdKraph7z323w8/6c3X8crpDA+v01r1y5UrNnz9a2bdu0du1a/f73v9drr72m3/72t5o7d64ef/xx3XrrrTp48KCqqqp0zTXX6I477tDjjz+uU6dO6YEHHtDNN9+sDz74QJI0a9YsffTRR3rjjTcUFRWluXPnaufOnefcb2S32zVy5EiVlJRo1apV6ty5s3bv3i2LxSJJqqioUN++ffXAAw8oPDxcb731lm699VZ17txZAwYMcOv7tHTpUs2dO1ePPPKI3n33Xd1333269NJLde2118pms2nMmDHq0KGDtm7dqpKSEt1///21Xl9dXa3hw4dr4MCB+vjjj9WsWTMtXLhQI0aM0BdffKHAwMBzvndNTY3GjBmjO++8Uy+88IKqqqq0bdu2Wjfv3r9/v9555x2tX79e+/fv10033aQDBw7o0ksv1UcffaTNmzfrtttuU1JSkhITEyWdbhv/l7/8RZ06ddKBAwc0bdo0zZ49W08//bRb3yMAAABcPJOjvlMkDSwxMVH9+/fXk08+KUmuGxjfe++9SktLqzP+nnvuUU5OjrKzs13H7r//fm3dulUbN27UqVOnFBYWpjfeeEOjRo1yjenbt69GjhyphQsXyuFwKDY2Vvfff7/+8Ic/SJJOnjyp6OhoPffccxo7duzP1l1cXKyIiAidPHlS4eHhtZ6rqKjQ119/rU6dOrnCXHlVjVcEsKFDh8pms+njjz+WJNlsNkVEROiGG27Q888/L0nKz89XTEyMtmzZovfff18ff/yx3n33zLUdPnxYcXFx2rNnj2JjY9W6dWutWrVK//M//yNJOnHihNq3b68pU6a4Zr3i4+M1Y8YMzZgxQ//+9781cuRI5eTk6NJLL72guq+//np169ZNf/7zn13X0bt37wtqrhEfH6/u3bvXmtUbO3asiouL9fbbb2v9+vVKTk7WoUOH1LZtW0nS+++/r2uvvVavvfaaxowZo1WrVmnhwoXKyclxBaeqqiq1aNFCr7/+uoYNG3bO9z9x4oRat26tDRs26Oqrr67z/MMPP6ylS5cqPz9fYWFhkqQRI0Zoz5492r9/v+v+XN26ddOkSZPO+t+NJL388suaOnWqa3byueee04wZM/TDDz+43uf111/X559/ftbXn+1zDQAAgPNng/9m6AxYVVWVduzYoTlz5riOmc1mJSUlacuWLWd9zaBBg7Rq1Spt27ZNAwYM0IEDB/T222/r1ltvlXR6NsFms9X5BdFqtWrjxo2SpK+//lr5+flKSkpyPR8REaHExERt2bLlggJYfVkDLNr9x+ENft4Led/6uvzyy11/tlgsat26tXr16uU6Fh0dLUkqLCzUrl279OGHH6p58+Z1zrN//36dOnVKVVVVrlkZSWrVqpUSEhLO+f6ff/652rdvf87wZbPZtHjxYq1bt05HjhxRVVWVKisrFRISUu9rdRo4cGCdx87wtmfPHsXFxbnCl6Q6M227du3Svn37XAHJqaKiQvv37z/ve7dq1UqTJk3S8OHDde211yopKUk333yzYmJiXGPi4+NrnTs6OloWi6XWzZGjo6NVWFjoevz+++8rIyNDubm5Ki4uVk1NjSoqKlReXn5R3ysA8HXHTp7SrkM/GF0GgAti0ojL2v78sCbE0ABWVFQkm83m+oXeKTo6Wrm5uWd9zfjx41VUVKQhQ4bI4XCopqZGU6dO1dy5cyVJYWFhGjhwoBYsWKDu3bsrOjpaL7zwgrZs2aIuXbpIOj2D43yf/35f53P/rbKyUpWVla7HxcXF9bpWk8lU76WARgkICKj12GQy1TrmnOGx2+0qLS1VcnKylixZUuc8MTEx2rdvX73f32q1nvf5pUuX6n//93+VlZXl2uM0Y8YMVVVV1fu9Gkppaan69u2r1atX13nuQhqM/P3vf9f06dO1fv16rV27VvPmzdN7772nX/ziF5J+/u/Eecxut0uSvvnmG11//fX6/e9/r0WLFqlVq1bauHGjbr/9dlVVVRHAAOAcHA6H/mf5Fh3+/pTRpQC4ABazSfsXX2d0GfXiHYngJzZs2KDFixfr6aefVmJiovbt26f77rtPCxYsUHp6uiTpH//4h2677Ta1a9dOFotFV155pcaNG6cdO3a4/b4ZGRl65JFHGuoyfMaVV16pV155RfHx8WrWrO7HqXPnzgoICNDWrVvVoUMHSdL333+vvLy8sy63k07PwB0+fFh5eXlnnQXbtGmTRo8erd/97neSTgfBvLw89ejRw+3r+OSTT+o87t69uyQpISFBhw4dUkFBgSu0/3dTlyuvvFJr165VVFTUz047n0ufPn3Up08fzZkzRwMHDtSaNWtcAay+duzYIbvdrscee8w1S7Zu3Tq3zgUA/uTYyQod/v6UzCbpyg4tjS4HwM8wm00/P6iJMTSAtWnTRhaLRQUFBbWOFxQU1Fru9VPp6em69dZbdccdd0iSevXqpbKyMk2ZMkUPPvigzGazOnfurI8++khlZWUqLi5WTEyMbrnlFl1yySWS5Dp3QUFBrWVeBQUF52wMMWfOHKWmproeFxcXKy4uzu1r9xV33323/vrXv2rcuHGaPXu2WrVqpX379unFF1/U//t//0/NmzfX7bffrlmzZql169aKiopy/T2dy9VXX61f/vKXuvHGG5WZmakuXbooNzdXJpNJI0aMUNeuXfXyyy9r8+bNatmypTIzM1VQUHBRAWzTpk169NFHNWbMGL333nt66aWX9NZbb0mSrr32WnXu3FkpKSl69NFHVVJSonnz5kk6Mxs4YcIELV26VKNHj9Yf//hHtW/fXt9++61effVVzZ49W+3btz/ne3/99dd69tln9Zvf/EaxsbHas2eP9u7dq4kTJ7p9PV26dFF1dbWeeOIJJScna9OmTVq+fLnb5wMAf7H76OkVLpdGh+nl3w8yuBoAvsjQNvSBgYHq27dvrYYadrtd2dnZdfbkOJWXl9f55d3ZHe+/+4mEhoYqJiZG33//vd59912NHj1aktSpUye1bdu21vsWFxdr69at53zfoKAghYeH1/qCFBsbq02bNslms2nYsGHq1auXZsyYoRYtWrj+npYuXaqrrrpKycnJSkpK0pAhQ9S3b9/znveVV15R//79NW7cOPXo0UOzZ8+WzWaTJM2bN09XXnmlhg8frqFDh6pt27YaM2bMRV3H/fffr08//VR9+vTRwoULlZmZqeHDT+/Zs1gsev3111VaWqr+/fvrjjvu0IMPPihJrr2GISEh+s9//qMOHTrohhtuUPfu3XX77beroqLiZz8rISEhys3N1Y033qhLL71UU6ZM0d1336277rrL7eu54oorlJmZqSVLluiyyy7T6tWrlZGR4fb5AMBf5Bw7HcB6xPDvPIDGYXgXxLVr1yolJUXPPPOMBgwYoKysLK1bt065ubmKjo7WxIkT1a5dO9cvjw8//LAyMzP17LPPupYg/v73v1ffvn21du1aSdK7774rh8OhhIQE7du3T7NmzVJwcLA+/vhj176ZJUuW6E9/+lOtNvRffPHFBbehr28XRPiWTZs2aciQIdq3b586d+5sdDkewecagD/4/aodeuerfM0b1V13XHWJ0eUA8BJe0wVRkm655RYdP35c8+fPV35+vnr37q3169e79tocPHiw1ozXvHnzZDKZNG/ePB05ckSRkZFKTk6udRPakydPas6cOTp8+LBatWqlG2+8UYsWLarVtGD27NmupYs//PCDhgwZovXr1/OLJc7qtddeU/PmzdW1a1fXvsPBgwf7TfgCAH+xmxkwAI3M8Bkwb8UMmHf4+OOPNXLkyHM+X1paekHnef7557Vw4UIdPHhQbdq0UVJSkh577DG1bt36gl5/tjb9Tu+8846uuuqqCzqPkfhcA/B1pZU1uuyh0/e1/Cz9WrUMDTS4IgDewqtmwIDG1K9fv3PeWLg+Jk6ceFFNMc5XQ7t27dw+LwCg4eT+OPsVExFM+ALQaAhg8GlWq9V1/zcjNYUaAADnx/JDAJ5gaBdEX8fqTvgSPs8AfJ2zBX13AhiARkQAawTOZh/l5eUGVwI0HOfn+afNbADAl7ha0McSwAA0HpYgNgKLxaIWLVqosLBQ0un7PDlv2At4G4fDofLychUWFqpFixau++4BgC+psdmVm18iiSWIABoXAayRtG3bVpJcIQzwdi1atHB9rgHA13xdVKbKGrtCAy3q0CrE6HIA+DACWCMxmUyKiYlRVFSUqqurjS4HuCgBAQHMfAHwac4GHN1iwmU2s2oFQOMhgDUyi8XCL64AADRxdEAE4Ck04QAAAH7P2QGRBhwAGhsBDAAA+D1nB0Ra0ANobAQwAADg1wpLKlRUWiWzSUqIDjO6HAA+jgAGAAD8mnP54SWRzWUNZN82gMZFAAMAAH6NBhwAPIkABgAA/FrOsdM3YGb/FwBPIIABAAC/tvvoSUl0QATgGQQwAADgt8qranSgqEwSSxABeAYBDAAA+K09+SVyOKQ2zYMUGRZkdDkA/AABDAAA+C3n/i+WHwLwFAIYAADwW7uP/bj/i+WHADyEAAYAAPyW8x5gzIAB8BQCGAAA8Et2u0O5+T8uQYwJM7gaAP6CAAYAAPzStyfKVV5lU3CAWZ3aNDe6HAB+ggAGAAD8knP5YULbcFnMJoOrAeAvCGAAAMAvnWnAwfJDAJ5DAAMAAH7J1YKeDogAPIgABgAA/BIdEAEYgQAGAAD8zomyKuUXV8hkOr0HDAA8hQAGAAD8Ts6x07NfHVuFqHlQM4OrAeBPCGAAAMDvsPwQgFEIYAAAwO/s/nEGjAYcADyNAAYAAPyOcwasOwEMgIcRwAAAgF+pqLZp//FSSSxBBOB5BDAAAOBX9hWWqsbuUMuQALUNDza6HAB+hgAGAAD8yk8bcJhMJoOrAeBvCGAAAMCvOBtwdOf+XwAMQAADAAB+xdUBkf1fAAxAAAMAAH7D4XAoh3uAATAQAQwAAPiNw9+fUklljQItZnWObG50OQD8EAEMAAD4Defyw67RzRVg4dcgAJ7HTx4AAOA3XB0QuQEzAIMQwAAAgN+gAQcAoxHAAACA33DOgHVnBgyAQQhgAADAL5w8Va0jP5ySRAADYBwCGAAA8As5Py4/bN/SqghrgMHVAPBXBDAAAOAXWH4IoCkggAEAAL/gnAGjAyIAIxHAAACAX6ADIoCmgAAGAAB8XlWNXXsLSiUxAwbAWAQwAADg8/YfL1WVza6w4GZq39JqdDkA/BgBDAAA+Dzn/q/uMeEymUwGVwPAnxHAAACAz3N2QGT5IQCjEcAAAIDP200HRABNBAEMAAD4NIfDcaYFPR0QARiMAAYAAHxafnGFvi+vVjOzSV2imhtdDgA/RwADAAA+zbn/q0tUcwUHWAyuBoC/I4ABAACf5gxg3dn/BaAJIIABAACflpNPAw4ATQcBDAAA+DRXC3oacABoAghgAADAZ5VW1uib78olsQQRQNNAAAMAAD5rz4/LD9uGB6tVaKDB1QAAAQwAAPgwlh8CaGoIYAAAwGftPkYDDgBNCwEMAAD4LFrQA2hqCGAAAMAn1djsys0vkcQSRABNBwEMAAD4pG++K1NljV0hgRZ1bBVidDkAIIkABgAAfNT//WT5odlsMrgaADiNAAYAAHxSzrHTyw+7x4QZXAkAnEEAAwAAPulMB8QIgysBgDMIYAAAwCdxDzAATREBDAAA+JzCkgoVlVbKbJISolmCCKDpIIABAACf49z/1alNqKyBFoOrAYAzCGAAAMDnnFl+yP4vAE0LAQwAAPicMw042P8FoGlpEgHsqaeeUnx8vIKDg5WYmKht27add3xWVpYSEhJktVoVFxenmTNnqqKiwvW8zWZTenq6OnXqJKvVqs6dO2vBggVyOByuMQUFBZo0aZJiY2MVEhKiESNGaO/evY12jQAAwHNyjjnvAcb+LwBNSzOjC1i7dq1SU1O1fPlyJSYmKisrS8OHD9eePXsUFRVVZ/yaNWuUlpamFStWaNCgQcrLy9OkSZNkMpmUmZkpSVqyZImWLVumlStXqmfPnvr00081efJkRUREaPr06XI4HBozZowCAgL0xhtvKDw8XJmZmUpKStLu3bsVGhrq6W8DAABoIKeqbDpwvFQSHRABND2Gz4BlZmbqzjvv1OTJk9WjRw8tX75cISEhWrFixVnHb968WYMHD9b48eMVHx+vYcOGady4cbVmzTZv3qzRo0dr1KhRio+P10033aRhw4a5xuzdu1effPKJli1bpv79+yshIUHLli3TqVOn9MILL3jkugEAQOPYU1Aiu0Nq0zxIUWHBRpcDALUYGsCqqqq0Y8cOJSUluY6ZzWYlJSVpy5YtZ33NoEGDtGPHDleYOnDggN5++21dd911tcZkZ2crLy9PkrRr1y5t3LhRI0eOlCRVVlZKkoKDz/xQNpvNCgoK0saNGxv2IgEAgEc5G3Cw/BBAU2ToEsSioiLZbDZFR0fXOh4dHa3c3Nyzvmb8+PEqKirSkCFD5HA4VFNTo6lTp2ru3LmuMWlpaSouLla3bt1ksVhks9m0aNEiTZgwQZLUrVs3dejQQXPmzNEzzzyj0NBQPf744zp8+LCOHTt21vetrKx0BTdJKi4uvtjLBwAAjcC5/4vlhwCaIsOXINbXhg0btHjxYj399NPauXOnXn31Vb311ltasGCBa8y6deu0evVqrVmzRjt37tTKlSv15z//WStXrpQkBQQE6NVXX1VeXp5atWqlkJAQffjhhxo5cqTM5rN/SzIyMhQREeH6iouL88j1AgCA+qEDIoCmzNAZsDZt2shisaigoKDW8YKCArVt2/asr0lPT9ett96qO+64Q5LUq1cvlZWVacqUKXrwwQdlNps1a9YspaWlaezYsa4x3377rTIyMpSSkiJJ6tu3rz7//HOdPHlSVVVVioyMVGJiovr163fW950zZ45SU1Ndj4uLiwlhAAA0MXa7wzUD1pMZMABNkKEzYIGBgerbt6+ys7Ndx+x2u7KzszVw4MCzvqa8vLzOLJXFcvoO98428+caY7fb65wvIiJCkZGR2rt3rz799FONHj36rO8bFBSk8PDwWl8AAKBpOXiiXOVVNgU1Myu+NV2NATQ9hrehT01NVUpKivr166cBAwYoKytLZWVlmjx5siRp4sSJateunTIyMiRJycnJyszMVJ8+fZSYmKh9+/YpPT1dycnJriCWnJysRYsWqUOHDurZs6c+++wzZWZm6rbbbnO970svvaTIyEh16NBBX375pe677z6NGTNGw4YN8/w3AQAANAjn8sNubcPUzOJ1Oy0A+AHDA9gtt9yi48ePa/78+crPz1fv3r21fv16V2OOgwcP1prNmjdvnkwmk+bNm6cjR44oMjLSFbicnnjiCaWnp2vatGkqLCxUbGys7rrrLs2fP9815tixY0pNTVVBQYFiYmI0ceJEpaene+7CAQBAg3N2QKQBB4CmyuRwrttDvRQXFysiIkInT55kOSIAAE3Ebc9t1we5hfrj6J6aODDe6HIA+In6ZAPm5gEAgM/IoQMigCaOAAYAAHzCibIqHTtZIUnqRgAD0EQRwAAAgE9wzn7Ftw5R8yDDt7kDwFkRwAAAgE9wBrDuzH4BaMIIYAAAwCe4OiASwAA0YQQwAADgE5z3AKMFPYCmjAAGAAC8XmWNTfsKSyWxBBFA00YAAwAAXm9vQalq7A61CAlQTESw0eUAwDkRwAAAgNfb/ZP7f5lMJoOrAYBzI4ABAACvRwMOAN6CAAYAALweLegBeAsCGAAA8GoOh4MOiAC8BgEMAAB4tcPfn1JJRY0CLWZ1jmxudDkAcF4EMAAA4NWcs19doporsBm/2gBo2vgpBQAAvFoOyw8BeBECGAAA8Gp0QATgTQhgAADAq9GAA4A3IYABAACvdfJUtQ5/f0qS1L0tAQxA00cAAwAAXiv3x9mvdi2siggJMLgaAPh5BDAAAOC1WH4IwNsQwAAAgNdyNuDoTgMOAF6CAAYAALxWTj4dEAF4FwIYAADwStU2u/LySyVJPVmCCMBLEMAAAIBX2n+8VFU2u8KCmql9S6vR5QDABSGAAQAAr5Rz7Mz+L5PJZHA1AHBhCGAAAMArORtw0AERgDchgAEAAK/kakFPAw4AXoQABgAAvI7D4aAFPQCvRAADAABep6C4Ut+XV8tiNqlrdHOjywGAC0YAAwAAXmf3sZOSpC6RzRUcYDG4GgC4cAQwAADgdWjAAcBbEcAAAIDXyTlWIknqHhNmcCUAUD8EMAAA4HXOdECMMLgSAKgfAhgAAPAqpZU1+ua7MknMgAHwPgQwAADgVfbkF8vhkKLDg9S6eZDR5QBAvRDAAACAV9n94/4vbsAMwBsRwAAAgFehAyIAb0YAAwAAXoUGHAC8GQEMAAB4DZvdoT35pwMYDTgAeCMCGAAA8BpfF5WpotqukECLOrYONbocAKg3AhgAAPAazuWH3dqGyWI2GVwNANQfAQwAAHgNZwOO7nRABOClCGAAAMBr5ByjAyIA70YAAwAAXuNMB0QCGADvRAADAABe4XhJpY6XVMpskrq1JYAB8E4EMAAA4BWcyw/j24TKGmgxuBoAcA8BDAAAeAWWHwLwBQQwAADgFZwdEGnAAcCbEcAAAIBXcM6A0YIegDcjgAEAgCavotqmA8dLJUk9CWAAvBgBDAAANHl78ktkd0htmgcqMizI6HIAwG0EMAAA0OT9dPmhyWQyuBoAcB8BDAAANHmuBhwsPwTg5QhgAACgyXPeA4wOiAC8HQEMAAA0aXa740wAYwYMgJcjgAEAgCbt4IlylVXZFNjMrE5tQo0uBwAuCgEMAAA0ac7Zr25tw9TMwq8uALwbP8UAAECTtpvlhwB8CAEMAAA0aa4OiDTgAOADCGAAAKBJ++k9wADA2xHAAABAk/V9WZWOnayQdHoPGAB4OwIYAABospwNODq2DlFYcIDB1QDAxSOAAQCAJsu1/LAtyw8B+Aa3AlhZWVlD1wEAAFCHqwMiDTgA+Ai3Alh0dLRuu+02bdy4saHrAQAAcHF1QKQBBwAf4VYAW7VqlU6cOKFrrrlGl156qf70pz/p6NGjDV0bAADwY5U1Nu0rLJXEDBgA3+FWABszZoxef/11HTlyRFOnTtWaNWvUsWNHXX/99Xr11VdVU1PT0HUCAAA/s7egVDV2hyKsAYqJCDa6HABoEBfVhCMyMlKpqan64osvlJmZqffff1833XSTYmNjNX/+fJWXlzdUnQAAwM84OyD2iAmXyWQyuBoAaBjNLubFBQUFWrlypZ577jl9++23uummm3T77bfr8OHDWrJkiT755BP9+9//bqhaAQCAH6EBBwBf5FYAe/XVV/X3v/9d7777rnr06KFp06bpd7/7nVq0aOEaM2jQIHXv3r2h6gQAAH7G2YCjOw04APgQtwLY5MmTNXbsWG3atEn9+/c/65jY2Fg9+OCDF1UcAADwTw6Ho9YSRADwFW4FsGPHjikkJOS8Y6xWqx566CG3igIAAP7tyA+nVFxRowCLSV2imhtdDgA0GLeacGzYsEHvvvtunePvvvuu3nnnnYsuCgAA+Dfn8sOuUWEKbHZRPcMAoElx6ydaWlqabDZbneMOh0NpaWn1Pt9TTz2l+Ph4BQcHKzExUdu2bTvv+KysLCUkJMhqtSouLk4zZ85URUWF63mbzab09HR16tRJVqtVnTt31oIFC+RwOFxjSktLdc8996h9+/ayWq3q0aOHli9fXu/aAQBAw3M24GD/FwBf49YSxL1796pHjx51jnfr1k379u2r17nWrl2r1NRULV++XImJicrKytLw4cO1Z88eRUVF1Rm/Zs0apaWlacWKFRo0aJDy8vI0adIkmUwmZWZmSpKWLFmiZcuWaeXKlerZs6c+/fRTTZ48WREREZo+fbokKTU1VR988IFWrVql+Ph4/fvf/9a0adMUGxur3/zmN258VwAAQEPJoQMiAB/l1gxYRESEDhw4UOf4vn37FBoaWq9zZWZm6s4779TkyZNds1AhISFasWLFWcdv3rxZgwcP1vjx4xUfH69hw4Zp3LhxtWbNNm/erNGjR2vUqFGKj4/XTTfdpGHDhtUZk5KSoqFDhyo+Pl5TpkzRFVdc8bOzbwAAoPHtpgEHAB/lVgAbPXq0ZsyYof3797uO7du3T/fff3+9Zo+qqqq0Y8cOJSUlnSnIbFZSUpK2bNly1tcMGjRIO3bscAWlAwcO6O2339Z1111Xa0x2drby8vIkSbt27dLGjRs1cuTIWmPefPNNHTlyRA6HQx9++KHy8vI0bNiwC64fAAA0vOKKah06cUoSAQyA73FrCeKjjz6qESNGqFu3bmrfvr0k6fDhw7rqqqv05z//+YLPU1RUJJvNpujo6FrHo6OjlZube9bXjB8/XkVFRRoyZIgcDodqamo0depUzZ071zUmLS1NxcXF6tatmywWi2w2mxYtWqQJEya4xjzxxBOaMmWK2rdvr2bNmslsNuuvf/2rfvnLX571fSsrK1VZWel6XFxcfMHXCQAALlzusRJJUrsWVkWEBBhcDQA0LLcCWEREhDZv3qz33ntPu3btktVq1eWXX37O8NKQNmzYoMWLF+vpp59WYmKi9u3bp/vuu08LFixQenq6JGndunVavXq11qxZo549e+rzzz/XjBkzFBsbq5SUFEmnA9gnn3yiN998Ux07dtR//vMf3X333YqNja01I+eUkZGhRx55pNGvDwAAf7f76ElJNOAA4JtMjp+2BvSwqqoqhYSE6OWXX9aYMWNcx1NSUvTDDz/ojTfeqPOaq666Sr/4xS+0dOlS17FVq1ZpypQpKi0tldlsVlxcnNLS0nT33Xe7xixcuFCrVq1Sbm6uTp06pYiICL322msaNWqUa8wdd9yhw4cPa/369XXe92wzYHFxcTp58qTCw/kHAgCAhjL75V1a9+lhTf91V6Vee6nR5QDAzyouLlZERMQFZQO3ZsAkqaysTB999JEOHjyoqqqqWs85Ow3+nMDAQPXt21fZ2dmuAGa325Wdna177rnnrK8pLy+X2Vx765rFYpEkV5v5c42x2+2SpOrqalVXV593zH8LCgpSUFDQBV0XAABw35kGHGEGVwIADc+tAPbZZ5/puuuuU3l5ucrKytSqVSsVFRUpJCREUVFRFxzApNPt4FNSUtSvXz8NGDBAWVlZKisr0+TJkyVJEydOVLt27ZSRkSFJSk5OVmZmpvr06eNagpienq7k5GRXEEtOTtaiRYvUoUMH9ezZU5999pkyMzN12223SZLCw8N19dVXa9asWbJarerYsaM++ugjPf/8865W9gAAwPOqbXblFZRKknrERBhcDQA0PLcC2MyZM5WcnKzly5crIiJCn3zyiQICAvS73/1O9913X73Odcstt+j48eOaP3++8vPz1bt3b61fv97VmOPgwYO1ZqrmzZsnk8mkefPm6ciRI4qMjHQFLqcnnnhC6enpmjZtmgoLCxUbG6u77rpL8+fPd4158cUXNWfOHE2YMEEnTpxQx44dtWjRIk2dOtWdbwkAAGgAB46XqarGrrCgZmrf0mp0OQDQ4NzaA9aiRQtt3bpVCQkJatGihbZs2aLu3btr69atSklJOWcHQ19Sn3WeAADgwrz22WHNXLtLA+Jbad3UgUaXAwAXpD7ZwK37gAUEBLhmpaKionTw4EFJp7sjHjp0yJ1TAgAAKOfHFvTd2f8FwEe5tQSxT58+2r59u7p27aqrr75a8+fPV1FRkf7xj3/osssua+gaAQCAn9h99McGHLGsLgHgm9yaAVu8eLFiYmIkSYsWLVLLli31+9//XsePH9ezzz7boAUCAAD/4HA4ftIBkQYcAHxTvWfAHA6HoqKiXDNdUVFRZ71vFgAAQH0UFFfqRFmVLGaTukY3N7ocAGgU9Z4Bczgc6tKlC3u9AABAg8r5cfarc2SoggMsBlcDAI2j3gHMbDara9eu+u677xqjHgAA4KfOLD9k/xcA3+XWHrA//elPmjVrlr766quGrgcAAPgpGnAA8AdudUGcOHGiysvLdcUVVygwMFBWa+0bJZ44caJBigMAAP7DuQSxOzNgAHyYWwEsKyurgcsAAAD+rKyyRl9/VyaJAAbAt7kVwFJSUhq6DgAA4Mdy80vkcEjR4UFq0zzI6HIAoNG4FcAOHjx43uc7dOjgVjEAAMA/7Wb5IQA/4VYAi4+Pl8lkOufzNpvN7YIAAID/yaEDIgA/4VYA++yzz2o9rq6u1meffabMzEwtWrSoQQoDAAD+gw6IAPyFWwHsiiuuqHOsX79+io2N1dKlS3XDDTdcdGEAAMA/2OwO5eYzAwbAP7h1H7BzSUhI0Pbt2xvylAAAwMd9812ZKqrtsgZY1LF1qNHlAECjcmsGrLi4uNZjh8OhY8eO6eGHH1bXrl0bpDAAAOAfnMsPu8WEyWI+9x5zAPAFbgWwFi1a1GnC4XA4FBcXpxdffLFBCgMAAP5hNw04APgRtwLYBx98UCuAmc1mRUZGqkuXLmrWzK1TAgAAP+WcAaMFPQB/4FZaGjp0aAOXAQAA/JWrBT0dEAH4AbeacGRkZGjFihV1jq9YsUJLliy56KIAAIB/OF5SqcKSSplMUre2YUaXAwCNzq0A9swzz6hbt251jvfs2VPLly+/6KIAAIB/cM5+dWoTqpBAtjEA8H1uBbD8/HzFxMTUOR4ZGaljx45ddFEAAMA/OAMY+78A+Au3AlhcXJw2bdpU5/imTZsUGxt70UUBAAD/QAdEAP7Grbn+O++8UzNmzFB1dbWuueYaSVJ2drZmz56t+++/v0ELBAAAvsvZAZEGHAD8hVsBbNasWfruu+80bdo0VVVVSZKCg4P1wAMPKC0trUELBAAAvqmi2qb9x0slMQMGwH+4FcBMJpOWLFmi9PR05eTkyGq1qmvXrgoKCmro+gAAgI/KKyiR3SG1Dg1UVBi/QwDwD24FsJMnT8pms6lVq1bq37+/6/iJEyfUrFkzhYfzf7EAAMD5/XT5oclkMrgaAPAMt5pwjB07Vi+++GKd4+vWrdPYsWMvuigAAOD7aMABwB+5FcC2bt2qX/3qV3WODx06VFu3br3oogAAgO+jBT0Af+RWAKusrFRNTU2d49XV1Tp16tRFFwUAAHyb3e5QzrESSXRABOBf3ApgAwYM0LPPPlvn+PLly9W3b9+LLgoAAPi2Q9+Xq7SyRoHNzLqkTajR5QCAx7jVhGPhwoVKSkrSrl279Otf/1rS6fuAbd++Xf/+978btEAAAOB7nA04EqLD1Mzi1v8PBgCv5NZPvMGDB2vLli2Ki4vTunXr9M9//lNdunTRF198oauuuqqhawQAAD4mhwYcAPyUWzNgktS7d2+tXr26IWsBAAB+wtUBkf1fAPyM2wHMqaKiQlVVVbWOcR8wAABwPj+9BxgA+BO3liCWl5frnnvuUVRUlEJDQ9WyZctaXwAAAOfyQ3mVjp6skCR1axtmcDUA4FluBbBZs2bpgw8+0LJlyxQUFKT/9//+nx555BHFxsbq+eefb+gaAQCAD3EuP+zQKkRhwQEGVwMAnuXWEsR//vOfev755zV06FBNnjxZV111lbp06aKOHTtq9erVmjBhQkPXCQAAfIRr+SENOAD4IbdmwE6cOKFLLrlE0un9XidOnJAkDRkyRP/5z38arjoAAOBznDNg3QlgAPyQWwHskksu0ddffy1J6tatm9atWyfp9MxYixYtGqw4AADge3KOlUiiAQcA/+RWAJs8ebJ27dolSUpLS9NTTz2l4OBgzZw5U7NmzWrQAgEAgO+oqrFrXyEBDID/cmsP2MyZM11/TkpKUm5urnbs2KEuXbro8ssvb7DiAACAb9lbWKJqm0MR1gDFRgQbXQ4AeNxF3wdMkjp27KiOHTvWOd6rVy+9/fbbiouLa4i3AQAAXs65/LB7TJhMJpPB1QCA57m1BPFCffPNN6qurm7MtwAAAF7kTAfECIMrAQBjNGoAAwAA+Kndx05KYv8XAP9FAAMAAB7hcDhcM2DdY8IMrgYAjEEAAwAAHnH0ZIWKK2oUYDGpaxQBDIB/IoABAACPcM5+dYkKU2AzfgUB4J/46QcAADziTAMO9n8B8F+NGsCeeeYZRUdHN+ZbAAAAL5FzjP1fAOD2fcC2b9+uDz/8UIWFhbLb7bWey8zMlCSNHz/+4qoDAAA+Y/ePAYwOiAD8mVsBbPHixZo3b54SEhIUHR1d60aK3FQRAAD8t+KKah08US6JJYgA/JtbAex///d/tWLFCk2aNKmBywEAAL4o91iJJCk2IlgtQgINrgYAjOPWHjCz2azBgwc3dC0AAMBH5bD8EAAkuRnAZs6cqaeeeqqhawEAAD6KDogAcJpbSxD/8Ic/aNSoUercubN69OihgICAWs+/+uqrDVIcAADwDTTgAIDT3Apg06dP14cffqhf/epXat26NY03AADAOdXY7NpTcHoPWHdmwAD4ObcC2MqVK/XKK69o1KhRDV0PAADwMQeKylRVY1fzoGaKaxlidDkAYCi39oC1atVKnTt3buhaAACAD3Lu/+oeEyazmVUzAPybWwHs4Ycf1kMPPaTy8vKGrgcAAPgY5/4vlh8CgJtLEP/yl79o//79io6OVnx8fJ0mHDt37myQ4gAAgPdztaAngAGAewFszJgxDVwGAADwRQ6H40wLejogAoB7Aeyhhx5q6DoAAIAPKiyp1HdlVbKYTbo0OszocgDAcG7tAQMAALgQztmvS9qEKjjAYnA1AGC8C54Ba9WqlfLy8tSmTRu1bNnyvPf+OnHiRIMUBwAAvBs3YAaA2i44gD3++OMKCwtz/ZmbLwMAgJ+zmwYcAFDLBQewlJQU158nTZrUGLUAAAAfk3OUFvQA8FNu7QHbuXOnvvzyS9fjN954Q2PGjNHcuXNVVVXVYMUBAADvVV5Vo6+/K5NEAAMAJ7cC2F133aW8vDxJ0oEDB3TLLbcoJCREL730kmbPnt2gBQIAAO+Um18ih0OKCgtSZFiQ0eUAQJPgVgDLy8tT7969JUkvvfSSrr76aq1Zs0bPPfecXnnllYasDwAAeCnu/wUAdbkVwBwOh+x2uyTp/fff13XXXSdJiouLU1FRUcNVBwAAvJazAQfLDwHgDLcCWL9+/bRw4UL94x//0EcffaRRo0ZJkr7++mtFR0c3aIEAAMA75dABEQDqcCuAZWVlaefOnbrnnnv04IMPqkuXLpKkl19+WYMGDWrQAgEAgPex2R3KPVYiiSWIAPBTbgWwyy+/XF9++aVOnjyphx56yHV86dKlWrlyZb3P99RTTyk+Pl7BwcFKTEzUtm3bzjs+KytLCQkJslqtiouL08yZM1VRUeF63mazKT09XZ06dZLValXnzp21YMECORwO1xiTyXTWr6VLl9a7fgAAUNs335XpVLVNwQFmxbcONbocAGgyLvg+YD916NAhmUwmtW/fXpK0bds2rVmzRj169NCUKVPqda61a9cqNTVVy5cvV2JiorKysjR8+HDt2bNHUVFRdcavWbNGaWlpWrFihQYNGqS8vDxNmjRJJpNJmZmZkqQlS5Zo2bJlWrlypXr27KlPP/1UkydPVkREhKZPny5JOnbsWK3zvvPOO7r99tt14403uvMtAQAAP+FcftitbbgsZpPB1QBA0+HWDNj48eP14YcfSpLy8/N17bXXatu2bXrwwQf1xz/+sV7nyszM1J133qnJkyerR48eWr58uUJCQrRixYqzjt+8ebMGDx6s8ePHKz4+XsOGDdO4ceNqzZpt3rxZo0eP1qhRoxQfH6+bbrpJw4YNqzWmbdu2tb7eeOMN/epXv9Ill1zixncEAAD8FB0QAeDs3ApgX331lQYMGCBJWrdunS677DJt3rxZq1ev1nPPPXfB56mqqtKOHTuUlJR0piCzWUlJSdqyZctZXzNo0CDt2LHDFaYOHDigt99+29WJ0TkmOzvbda+yXbt2aePGjRo5cuRZz1lQUKC33npLt99++wXXDgAAzm03DTgA4KzcWoJYXV2toKDTN1R8//339Zvf/EaS1K1btzpL+86nqKhINputTufE6Oho5ebmnvU148ePV1FRkYYMGSKHw6GamhpNnTpVc+fOdY1JS0tTcXGxunXrJovFIpvNpkWLFmnChAlnPefKlSsVFhamG2644Zy1VlZWqrKy0vW4uLj4gq8TAAB/45wBowU9ANTm1gxYz549tXz5cn388cd67733NGLECEnS0aNH1bp16wYt8L9t2LBBixcv1tNPP62dO3fq1Vdf1VtvvaUFCxa4xqxbt06rV6/WmjVrtHPnTq1cuVJ//vOfz9kgZMWKFZowYYKCg4PP+b4ZGRmKiIhwfcXFxTX4tQEA4AuKSitVWFIpk0nq1jbM6HIAoElxawZsyZIl+u1vf6ulS5cqJSVFV1xxhSTpzTffdC1NvBBt2rSRxWJRQUFBreMFBQVq27btWV+Tnp6uW2+9VXfccYckqVevXiorK9OUKVP04IMPymw2a9asWUpLS9PYsWNdY7799ltlZGQoJSWl1vk+/vhj7dmzR2vXrj1vrXPmzFFqaqrrcXFxMSEMAICzcDbg6NQ6VKFBbv2qAQA+y62fikOHDlVRUZGKi4vVsmVL1/EpU6YoJCTkgs8TGBiovn37Kjs7W2PGjJEk2e12ZWdn65577jnra8rLy2U21564s1gskuRqM3+uMXa7vc75/va3v6lv376uEHkuQUFBrmWXAADg3Fh+CADn5vb/lrJYLKqpqdHGjRslSQkJCYqPj6/3eVJTU5WSkqJ+/fppwIABysrKUllZmSZPnixJmjhxotq1a6eMjAxJUnJysjIzM9WnTx8lJiZq3759Sk9PV3JysiuIJScna9GiRerQoYN69uypzz77TJmZmbrttttqvXdxcbFeeuklPfbYY+5+GwAAwH9xzoDRAREA6nIrgJWVlenee+/V888/75pVslgsmjhxop544ol6zYLdcsstOn78uObPn6/8/Hz17t1b69evdzXmOHjwYK3ZrHnz5slkMmnevHk6cuSIIiMjXYHL6YknnlB6erqmTZumwsJCxcbG6q677tL8+fNrvfeLL74oh8OhcePGufNtAAAAZ0EHRAA4N5PDuW6vHu666y69//77evLJJzV48GBJ0saNGzV9+nRde+21WrZsWYMX2tQUFxcrIiJCJ0+eVHg4/8AAACBJFdU29XzoXdnsDm2d+2tFh5+7wRUA+Ir6ZAO3ZsBeeeUVvfzyyxo6dKjr2HXXXSer1aqbb77ZLwIYAACoK6+gRDa7Q61CAxUVxt5pAPhvbrWhLy8vr3PvLkmKiopSeXn5RRcFAAC8U85Plh+aTCaDqwGApsetADZw4EA99NBDqqiocB07deqUHnnkEQ0cOLDBigMAAN7F2QGRBhwAcHZuLUHMysrSiBEj1L59e1f79l27dikoKEj//ve/G7RAAADgPWjAAQDn51YA69Wrl/bu3avVq1crNzdXkjRu3DhNmDBBVqu1QQsEAADewW53KOdYiSTuAQYA5+JWAMvIyFB0dLTuvPPOWsdXrFih48eP64EHHmiQ4gAAgPc4/P0plVbWKLCZWZdEhhpdDgA0SW7tAXvmmWfUrVu3Osd79uyp5cuXX3RRAADA++w+dlKSlBAdpgCLW79iAIDPc+unY35+vmJiYuocj4yM1LFjxy66KAAA4H2cDTi6x4QZXAkANF1uBbC4uDht2rSpzvFNmzYpNjb2oosCAADeZ/eP+79owAEA5+bWHrA777xTM2bMUHV1ta655hpJUnZ2tmbPnq3777+/QQsEAADewXUPsNgIgysBgKbLrQA2a9Ysfffdd5o2bZqqqqokScHBwXrggQc0Z86cBi0QAAA0fT+UV+nID6ckSd1YgggA5+RWADOZTFqyZInS09OVk5Mjq9Wqrl27KigoqKHrAwAAXsDZfj6ulVXhwQEGVwMATZdbAcypefPm6t+/f0PVAgAAvBQ3YAaAC0OPWAAAcNGcHRB7xLD/CwDOhwAGAAAumnMGjBb0AHB+BDAAAHBRqmrs2lf4Ywv6WJYgAsD5EMAAAMBF2VdYqmqbQ+HBzdSuhdXocgCgSSOAAQCAi+JqwBEbLpPJZHA1ANC0EcAAAMBFyXHt/2L5IQD8HAIYAAC4KGc6IBLAAODnEMAAAIDbHA5HrSWIAIDzI4ABAAC3HT1ZoZOnqtXMbFKXqOZGlwMATR4BDAAAuC3nx+WHXaKaK6iZxeBqAKDpI4ABAAC3sfwQAOqHAAYAANxGAw4AqB8CGAAAcFtOPgEMAOqDAAYAANxSUlGtb78rl8Q9wADgQhHAAACAW3LzSyRJsRHBahkaaHA1AOAdCGAAAMAtzv1fzH4BwIUjgAEAALfk0AERAOqNAAYAANziakHPDBgAXDACGAAAqLcam921B4wZMAC4cAQwAABQb18Xlamqxq7QQIviWoYYXQ4AeA0CGAAAqDfn8sPuMeEym00GVwMA3oMABgAA6s3ZAZHlhwBQPwQwAABQbz+dAQMAXDgCGAAAqBeHw3FmBowABgD1QgADAAD1crykUt+VVclskhLahhldDgB4FQIYAACol//7cflh58jmCg6wGFwNAHgXAhgAAKiXHPZ/AYDbCGAAAKBe6IAIAO4jgAEAgHpxdkCkAQcA1B8BDAAAXLDyqhp9XVQmiSWIAOAOAhgAALhge/JL5HBIkWFBigwLMrocAPA6BDAAAHDBWH4IABeHAAYAAC4YDTgA4OIQwAAAwAWjBT0AXBwCGAAAuCA2u0O5+SWSWIIIAO4igAEAgAvy7XdlKq+yKTjArE5tQo0uBwC8EgEMAABcEGcDjoS24bKYTQZXAwDeiQAGAAAuSA4dEAHgohHAAADABaEDIgBcPAIYAAC4INwDDAAuHgEMAAD8rO9KK1VQXCmTSerWNszocgDAaxHAAADAz8o5drr9fHzrUIUGNTO4GgDwXgQwAADws3YfOymJ5YcAcLEIYAAA4Gc5G3B0j2H5IQBcDAIYAAD4Wc4liHRABICLQwADAADnVVFt077jpZKkHjERBlcDAN6NAAYAAM5rb0GpbHaHWoUGKjo8yOhyAMCrEcAAAMB55Rw7s//LZDIZXA0AeDcCGAAAOC9uwAwADYcABgAAzsvZAZEGHABw8QhgAADgnBwOx0+WIBLAAOBiEcAAAMA5Hf7+lEoqaxRoMatzZHOjywEAr0cAAwAA5/R/Py4/vLRtcwVY+LUBAC4WP0kBAMA50YADABoWAQwAAJwT+78AoGERwAAAwDm5OiASwACgQRDAAADAWZ0sr9aRH05JkrrTgh4AGgQBDAAAnJVz/1f7llaFBwcYXA0A+AYCGAAAOKscGnAAQIMjgAEAgLNydUBk+SEANJgmEcCeeuopxcfHKzg4WImJidq2bdt5x2dlZSkhIUFWq1VxcXGaOXOmKioqXM/bbDalp6erU6dOslqt6ty5sxYsWCCHw1HrPDk5OfrNb36jiIgIhYaGqn///jp48GCjXCMAAN6GBhwA0PCaGV3A2rVrlZqaquXLlysxMVFZWVkaPny49uzZo6ioqDrj16xZo7S0NK1YsUKDBg1SXl6eJk2aJJPJpMzMTEnSkiVLtGzZMq1cuVI9e/bUp59+qsmTJysiIkLTp0+XJO3fv19DhgzR7bffrkceeUTh4eH6v//7PwUHB3v0+gEAaIqqauzaW1giiRb0ANCQTI7/nhbysMTERPXv319PPvmkJMlutysuLk733nuv0tLS6oy/5557lJOTo+zsbNex+++/X1u3btXGjRslSddff72io6P1t7/9zTXmxhtvlNVq1apVqyRJY8eOVUBAgP7xj3+4VXdxcbEiIiJ08uRJhYfzDxMAwLfkHCvWyP/9WGHBzfTFQ8NkMpmMLgkAmqz6ZANDlyBWVVVpx44dSkpKch0zm81KSkrSli1bzvqaQYMGaceOHa5ligcOHNDbb7+t6667rtaY7Oxs5eXlSZJ27dqljRs3auTIkZJOh7y33npLl156qYYPH66oqCglJibq9ddfb6QrBQDAu/x0+SHhCwAajqFLEIuKimSz2RQdHV3reHR0tHJzc8/6mvHjx6uoqEhDhgyRw+FQTU2Npk6dqrlz57rGpKWlqbi4WN26dZPFYpHNZtOiRYs0YcIESVJhYaFKS0v1pz/9SQsXLtSSJUu0fv163XDDDfrwww919dVX13nfyspKVVZWuh4XFxc3xLcAAIAmydmAg+WHANCwmkQTjvrYsGGDFi9erKefflo7d+7Uq6++qrfeeksLFixwjVm3bp1Wr16tNWvWaOfOnVq5cqX+/Oc/a+XKlZJOz4BJ0ujRozVz5kz17t1baWlpuv7667V8+fKzvm9GRoYiIiJcX3FxcY1/sQAAGCSHDogA0CgMnQFr06aNLBaLCgoKah0vKChQ27Ztz/qa9PR03XrrrbrjjjskSb169VJZWZmmTJmiBx98UGazWbNmzVJaWprGjh3rGvPtt98qIyNDKSkpatOmjZo1a6YePXrUOnf37t1d+8j+25w5c5Samup6XFxcTAgDAPgkh8NxpgU9M2AA0KAMnQELDAxU3759azXUsNvtys7O1sCBA8/6mvLycpnNtcu2WCyS5Gozf64xzpmvwMBA9e/fX3v27Kk1Ji8vTx07djzr+wYFBSk8PLzWFwAAvujYyQr9UF6tZmaTukY3N7ocAPAphrehT01NVUpKivr166cBAwYoKytLZWVlmjx5siRp4sSJateunTIyMiRJycnJyszMVJ8+fZSYmKh9+/YpPT1dycnJriCWnJysRYsWqUOHDurZs6c+++wzZWZm6rbbbnO976xZs3TLLbfol7/8pX71q19p/fr1+uc//6kNGzZ4/HsAAEBT4mzA0SWquYKaWQyuBgB8i+EB7JZbbtHx48c1f/585efnq3fv3lq/fr2rMcfBgwdrzWbNmzdPJpNJ8+bN05EjRxQZGekKXE5PPPGE0tPTNW3aNBUWFio2NlZ33XWX5s+f7xrz29/+VsuXL1dGRoamT5+uhIQEvfLKKxoyZIjnLh4AgCYoh+WHANBoDL8PmLfiPmAAAF/1+1U79M5X+Zo3qrvuuOoSo8sBgCbPa+4DBgAAmh5a0ANA4yGAAQAAl9LKGn37XbkkAhgANAYCGAAAcMn9cfYrJiJYrUIDDa4GAHwPAQwAALhw/y8AaFwEMAAA4OJsQc/yQwBoHAQwAADg4mpBH0sAA4DGQAADAACSpBqbXbn5JZJYgggAjYUABgAAJElfF5Wpssau0ECLOrQKMbocAPBJBDAAACDpTAOObjHhMptNBlcDAL6JAAYAACTRAREAPIEABgAAJJ3pgEgDDgBoPAQwAAAg6UwHRFrQA0DjIYABAAAVllSoqLRKZpOUEB1mdDkA4LMIYAAAwLX88JLI5rIGWgyuBgB8FwEMAADQgAMAPIQABgAAlHPs9A2Y2f8FAI2LAAYAALT76ElJdEAEgMZGAAMAwM+VV9XoQFGZJJYgAkBjI4ABAODn9uSXyOGQ2jQPUmRYkNHlAIBPI4ABAODnnPu/WH4IAI2PAAYAgJ/bfezH/V8sPwSARkcAAwDAzznvAcYMGAA0PgIYAAB+zG53KDf/xyWIMWEGVwMAvo8ABgCAH/v2RLnKq2wKDjCrU5vmRpcDAD6PAAYAgB9zLj9MaBsui9lkcDUA4PsIYAAA+LEzDThYfggAnkAAAwDAj7la0NMBEQA8ggAGAIAfowMiAHgWAQwAAD91oqxK+cUVMplO7wEDADQ+AhgAAH4q59jp2a+OrULUPKiZwdUAgH8ggAEA4KdYfggAnkcAAwDAT+3+cQaMBhwA4DkEMAAA/JRzBqw7AQwAPIYABgCAH6qotmn/8VJJLEEEAE8igAEA4If2FZaqxu5Qy5AAtQ0PNrocAPAbBDAAAPzQTxtwmEwmg6sBAP9BAAMAwA85G3B05/5fAOBRBDAAAPyQqwMi+78AwKMIYAAA+BmHw6Ec7gEGAIYggAEA4GcOf39KJZU1CrSY1TmyudHlAIBfIYABAOBnnMsPu0Y3V4CFXwUAwJP4qQsAgJ9xdUDkBswA4HEEMAAA/AwNOADAOAQwAAD8TI6zBT0zYADgcQQwAAD8yMlT1Tr8/SlJBDAAMAIBDAAAP+Kc/Wrf0qoIa4DB1QCA/yGAAQDgR5wNOJj9AgBjEMAAAPAjzhkwOiACgDEIYAAA+BE6IAKAsQhgAAD4iWqbXXsLSiUxAwYARiGAAQDgJ/YfL1WVza6w4GZq39JqdDkA4JcIYAAA+ImfNuAwmUwGVwMA/okABgCAn3AGMJYfAoBxCGAAAPiJ3XRABADDEcAAAPADDofjTAt6OiACgGEIYAAA+IH84gp9X16tZmaTukQ1N7ocAPBbBDAAAPyAc/9Xl6jmCg6wGFwNAPgvAhgAAH7AufywO/u/AMBQBDAAAPwADTgAoGkggAEA4AdcLehpwAEAhiKAAQDg40ora/TNd+WSWIIIAEYjgAEA4OP25J+e/WobHqxWoYEGVwMA/o0ABgCAj2P5IQA0HQQwAAB8HA04AKDpIIABAODjdh8rkcT+LwBoCghgAAD4sBqbXbnHWIIIAE0FAQwAAB/2zXdlqqyxKyTQoo6tQowuBwD8HgEMAAAf9n8/NuDo1jZMZrPJ4GoAAAQwAAB8WM6P+79YfggATQMBDAAAH3amA2KEwZUAACQCGAAAPo17gAFA00IAAwDARxWWVKiotFJmk5QQHWZ0OQAAEcAAAPBZzv1fndqEyhpoMbgaAIDURALYU089pfj4eAUHBysxMVHbtm077/isrCwlJCTIarUqLi5OM2fOVEVFhet5m82m9PR0derUSVarVZ07d9aCBQvkcDhcYyZNmiSTyVTra8SIEY12jQAAeNqZ5Yfs/wKApqKZ0QWsXbtWqampWr58uRITE5WVlaXhw4drz549ioqKqjN+zZo1SktL04oVKzRo0CDl5eW5wlRmZqYkacmSJVq2bJlWrlypnj176tNPP9XkyZMVERGh6dOnu841YsQI/f3vf3c9DgoKavwLBgDAQ5wNOLrHsPwQAJoKwwNYZmam7rzzTk2ePFmStHz5cr311ltasWKF0tLS6ozfvHmzBg8erPHjx0uS4uPjNW7cOG3durXWmNGjR2vUqFGuMS+88EKdmbWgoCC1bdu2sS4NAABD5bg6INKAAwCaCkMDWFVVlXbs2KE5c+a4jpnNZiUlJWnLli1nfc2gQYO0atUqbdu2TQMGDNCBAwf09ttv69Zbb6015tlnn1VeXp4uvfRS7dq1Sxs3bnTNkDlt2LBBUVFRatmypa655hotXLhQrVu3bpyLbUS7Dv2gYydPGV0GAKAJsdmlA8dLJdEBEQCaEkMDWFFRkWw2m6Kjo2sdj46OVm5u7llfM378eBUVFWnIkCFyOByqqanR1KlTNXfuXNeYtLQ0FRcXq1u3brJYLLLZbFq0aJEmTJjgGjNixAjdcMMN6tSpk/bv36+5c+dq5MiR2rJliyyWuhuVKysrVVlZ6XpcXFx8sZffYFZs+lpvfH7U6DIAAE1Qm+aBigoLNroMAMCPDF+CWF8bNmzQ4sWL9fTTTysxMVH79u3TfffdpwULFig9PV2StG7dOq1evVpr1qxRz5499fnnn2vGjBmKjY1VSkqKJGns2LGuc/bq1UuXX365OnfurA0bNujXv/51nffNyMjQI4884pmLrKdObULVr2NLo8sAADQxJpN0S/8ORpcBAPgJk+OnrQE9rKqqSiEhIXr55Zc1ZswY1/GUlBT98MMPeuONN+q85qqrrtIvfvELLV261HVs1apVmjJlikpLS2U2mxUXF6e0tDTdfffdrjELFy7UqlWrzjmzJkmRkZFauHCh7rrrrjrPnW0GLC4uTidPnlR4OEs7AAAAAH9VXFysiIiIC8oGhrahDwwMVN++fZWdne06ZrfblZ2drYEDB571NeXl5TKba5ftXDLozJLnGmO3289Zy+HDh/Xdd98pJibmrM8HBQUpPDy81hcAAAAA1IfhSxBTU1OVkpKifv36acCAAcrKylJZWZmrK+LEiRPVrl07ZWRkSJKSk5OVmZmpPn36uJYgpqenKzk52RXEkpOTtWjRInXo0EE9e/bUZ599pszMTN12222SpNLSUj3yyCO68cYb1bZtW+3fv1+zZ89Wly5dNHz4cGO+EQAAAAB8nuEB7JZbbtHx48c1f/585efnq3fv3lq/fr2rMcfBgwdrzWbNmzdPJpNJ8+bN05EjRxQZGekKXE5PPPGE0tPTNW3aNBUWFio2NlZ33XWX5s+fL+n0bNgXX3yhlStX6ocfflBsbKyGDRumBQsWcC8wAAAAAI3G0D1g3qw+6zwBAAAA+C6v2QMGAAAAAP6EAAYAAAAAHkIAAwAAAAAPIYABAAAAgIcQwAAAAADAQwhgAAAAAOAhBDAAAAAA8BACGAAAAAB4CAEMAAAAADyEAAYAAAAAHkIAAwAAAAAPIYABAAAAgIcQwAAAAADAQwhgAAAAAOAhzYwuwFs5HA5JUnFxscGVAAAAADCSMxM4M8L5EMDcVFJSIkmKi4szuBIAAAAATUFJSYkiIiLOO8bkuJCYhjrsdruOHj2qsLAwmUwmQ2spLi5WXFycDh06pPDwcENrgX/gMwdP4zMHT+LzBk/jM+f9HA6HSkpKFBsbK7P5/Lu8mAFzk9lsVvv27Y0uo5bw8HD+o4VH8ZmDp/GZgyfxeYOn8Znzbj838+VEEw4AAAAA8BACGAAAAAB4CAHMBwQFBemhhx5SUFCQ0aXAT/CZg6fxmYMn8XmDp/GZ8y804QAAAAAAD2EGDAAAAAA8hAAGAAAAAB5CAAMAAAAADyGAAQAAAICHEMB8wFNPPaX4+HgFBwcrMTFR27ZtM7ok+KiMjAz1799fYWFhioqK0pgxY7Rnzx6jy4Kf+NOf/iSTyaQZM2YYXQp82JEjR/S73/1OrVu3ltVqVa9evfTpp58aXRZ8kM1mU3p6ujp16iSr1arOnTtrwYIFoj+e7yOAebm1a9cqNTVVDz30kHbu3KkrrrhCw4cPV2FhodGlwQd99NFHuvvuu/XJJ5/ovffeU3V1tYYNG6aysjKjS4OP2759u5555hldfvnlRpcCH/b9999r8ODBCggI0DvvvKPdu3frscceU8uWLY0uDT5oyZIlWrZsmZ588knl5ORoyZIlevTRR/XEE08YXRoaGW3ovVxiYqL69++vJ598UpJkt9sVFxene++9V2lpaQZXB193/PhxRUVF6aOPPtIvf/lLo8uBjyotLdWVV16pp59+WgsXLlTv3r2VlZVldFnwQWlpadq0aZM+/vhjo0uBH7j++usVHR2tv/3tb65jN954o6xWq1atWmVgZWhszIB5saqqKu3YsUNJSUmuY2azWUlJSdqyZYuBlcFfnDx5UpLUqlUrgyuBL7v77rs1atSoWj/rgMbw5ptvql+/fvqf//kfRUVFqU+fPvrrX/9qdFnwUYMGDVJ2drby8vIkSbt27dLGjRs1cuRIgytDY2tmdAFwX1FRkWw2m6Kjo2sdj46OVm5urkFVwV/Y7XbNmDFDgwcP1mWXXWZ0OfBRL774onbu3Knt27cbXQr8wIEDB7Rs2TKlpqZq7ty52r59u6ZPn67AwEClpKQYXR58TFpamoqLi9WtWzdZLBbZbDYtWrRIEyZMMLo0NDICGAC33H333frqq6+0ceNGo0uBjzp06JDuu+8+vffeewoODja6HPgBu92ufv36afHixZKkPn366KuvvtLy5csJYGhw69at0+rVq7VmzRr17NlTn3/+uWbMmKHY2Fg+bz6OAObF2rRpI4vFooKCglrHCwoK1LZtW4Oqgj+455579K9//Uv/+c9/1L59e6PLgY/asWOHCgsLdeWVV7qO2Ww2/ec//9GTTz6pyspKWSwWAyuEr4mJiVGPHj1qHevevbteeeUVgyqCL5s1a5bS0tI0duxYSVKvXr307bffKiMjgwDm49gD5sUCAwPVt29fZWdnu47Z7XZlZ2dr4MCBBlYGX+VwOHTPPffotdde0wcffKBOnToZXRJ82K9//Wt9+eWX+vzzz11f/fr104QJE/T5558TvtDgBg8eXOfWGnl5eerYsaNBFcGXlZeXy2yu/au4xWKR3W43qCJ4CjNgXi41NVUpKSnq16+fBgwYoKysLJWVlWny5MlGlwYfdPfdd2vNmjV64403FBYWpvz8fElSRESErFarwdXB14SFhdXZXxgaGqrWrVuz7xCNYubMmRo0aJAWL16sm2++Wdu2bdOzzz6rZ5991ujS4IOSk5O1aNEidejQQT179tRnn32mzMxM3XbbbUaXhkZGG3of8OSTT2rp0qXKz89X79699Ze//EWJiYlGlwUfZDKZznr873//uyZNmuTZYuCXhg4dSht6NKp//etfmjNnjvbu3atOnTopNTVVd955p9FlwQeVlJQoPT1dr732mgoLCxUbG6tx48Zp/vz5CgwMNLo8NCICGAAAAAB4CHvAAAAAAMBDCGAAAAAA4CEEMAAAAADwEAIYAAAAAHgIAQwAAAAAPIQABgAAAAAeQgADAAAAAA8hgAEAYACTyaTXX3/d6DIAAB5GAAMA+J1JkybJZDLV+RoxYoTRpQEAfFwzowsAAMAII0aM0N///vdax4KCggyqBgDgL5gBAwD4paCgILVt27bWV8uWLSWdXh64bNkyjRw5UlarVZdccolefvnlWq//8ssvdc0118hqtap169aaMmWKSktLa41ZsWKFevbsqaCgIMXExOiee+6p9XxRUZF++9vfKiQkRF27dtWbb77ZuBcNADAcAQwAgLNIT0/XjTfeqF27dmnChAkaO3ascnJyJEllZWUaPny4WrZsqe3bt+ull17S+++/XytgLVu2THfffbemTJmiL7/8Um+++aa6dOlS6z0eeeQR3Xzzzfriiy903XXXacKECTpx4oRHrxMA4Fkmh8PhMLoIAAA8adKkSVq1apWCg4NrHZ87d67mzp0rk8mkqVOnatmyZa7nfvGLX+jKK6/U008/rb/+9a964IEHdOjQIYWGhkqS3n77bSUnJ+vo0aOKjo5Wu3btNHnyZC1cuPCsNZhMJs2bN08LFiyQdDrUNW/eXO+88w570QDAh7EHDADgl371q1/VCliS1KpVK9efBw4cWOu5gQMH6vPPP5ck5eTk6IorrnCFL0kaPHiw7Ha79uzZI5PJpKNHj+rXv/71eWu4/PLLXX8ODQ1VeHi4CgsL3b0kAIAXIIABAPxSaGhonSWBDcVqtV7QuICAgFqPTSaT7HZ7Y5QEAGgi2AMGAMBZfPLJJ3Ued+/eXZLUvXt37dq1S2VlZa7nN23aJLPZrISEBIWFhSk+Pl7Z2dkerRkA0PQxAwYA8EuVlZXKz8+vdaxZs2Zq06aNJOmll15Sv379NGTIEK1evVrbtm3T3/72N0nShAkT9NBDDyklJUUPP/ywjh8/rnvvvVe33nqroqOjJUkPP/ywpk6dqqioKI0cOVIlJSXatGmT7r33Xs9eKACgSSGAAQD80vr16xUTE1PrWEJCgnJzcyWd7lD44osvatq0aYqJidELL7ygHj16SJJCQkL07rvv6r777lP//v0VEhKiG2+8UZmZma5zpaSkqKKiQo8//rj+8Ic/qE2bNrrppps8d4EAgCaJLogAAPwXk8mk1157TWPGjDG6FACAj2EPGAAAAAB4CAEMAAAAADyEPWAAAPwXVucDABoLM2AAAAAA4CEEMAAAAADwEAIYAAAAAHgIAQwAAAAAPIQABgAAAAAeQgADAAAAAA8hgAEAAACAhxDAAAAAAMBDCGAAAAAA4CH/H4x+EQN73jhbAAAAAElFTkSuQmCC",
      "text/plain": [
       "<Figure size 1000x600 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "eval_small_df = pd.read_csv('./medical_bge_small/eval/binary_classification_evaluation_med-dev_results.csv')\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(eval_small_df['epoch'], eval_small_df['cossim_accuracy'], label='medical_bge_small')\n",
    "plt.xlabel('Epoch')\n",
    "plt.ylabel('cossim_accuracy')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二、使用llm构建数据集\n",
    "\n",
    "网络上很难找到与项目匹配的中文数据集，这种情况下，我们可以借助大语言模型使用项目所涉及的知识构建属于自己的数据集。LangChain虽然在大模型应用开发方面提供了高效的API但是并没有涉及到模型微调，因此在本节我们使用自定义方法借助智谱与OpenAi的大语言模型根据自己的知识库构建数据集。\n",
    "> 本小节内容参考[LLamaIndex Finetune Embeddings](https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding/)与[langchain.chains.qa_generation.base.QAGenerationChain](https://api.python.langchain.com/en/latest/chains/langchain.chains.qa_generation.base.QAGenerationChain.html#langchain.chains.qa_generation.base.QAGenerationChain)部分内容,并对数据集结构以及大模型调用做了优化。\n",
    "\n",
    "我们先使用 `PyMuPDFLoader` 加载内容，加载之后的 `pdf_pages`格式为`List[Document]`。每个`Document`都有`page_content`正文以及`metadata`元数据等属性。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders.pdf import PyMuPDFLoader\n",
    "\n",
    "# 创建一个 PyMuPDFLoader Class 实例，输入为待加载的 pdf 文档路径\n",
    "loader = PyMuPDFLoader(\"../../../data_base/knowledge_db/pumkin_book/pumpkin_book.pdf\")\n",
    "\n",
    "# 调用 PyMuPDFLoader Class 的函数 load 对 pdf 文件进行加载\n",
    "pdf_pages = loader.load()\n",
    "# 第13页为南瓜书第一页正文，因此从13页开始\n",
    "train_pages = pdf_pages[13:15]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来我们遍历`Document`，对`page_content`进行数据清洗，删除无意义的字符与文字，使数据便于大模型理解。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def clean_text(text: str):\n",
    "    # 删除每页开头与结尾标语及链接\n",
    "    text = re.sub(r'→_→\\n欢迎去各大电商平台选购纸质版南瓜书《机器学习公式详解》\\n←_←', '', text)\n",
    "    text = re.sub(r'→_→\\n配套视频教程：https://www.bilibili.com/video/BV1Mh411e7VU\\n←_←', '', text)\n",
    "    # 删除字符串开头的空格\n",
    "    text = re.sub(r'\\s+', '', text)\n",
    "    # 删除回车\n",
    "    text = re.sub(r'\\n+', '', text)\n",
    "\n",
    "    return text\n",
    "\n",
    "for page in train_pages:\n",
    "    page.page_content = clean_text(page.page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将清洗好的数据传入`docs_generate_qa_pairs`方法，docs_generate_qa_pairs接收三个参数：\n",
    "1. docs：格式为`List[Document]`的文档。\n",
    "2. num_questions_per_page：每页生成的QA对数量，默认为2。\n",
    "3. moedl：生成QA对所使用的模型，默认为智谱的`glm-4`。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 2/2 [00:46<00:00, 23.31s/it]\n"
     ]
    }
   ],
   "source": [
    "from generate_qa_pairs import docs_generate_qa_pairs\n",
    "\n",
    "qa_pairs = docs_generate_qa_pairs(docs=train_pages)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "返回的结果为`QaPairs`类，我们可以通过调用`QaPairs.qa_pairs`的`'query'`与`'answer'`的返回问题与问题对应的答案。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "第1个问题：请根据提供的上下文信息，解释“特征工程”在机器学习中的作用，并给出一个具体的例子。\n",
      "第1个答案：显然，用中文书写向量的方式不够“数学”，因此需要将属性值进一步数值化，具体例子参见“西瓜书”第3章3.2。此外，仅靠以上3个特征来刻画西瓜显然不够全面细致，因此还需要扩展更多维度的特征，一般称此类与特征处理相关的工作为“特征工程”。\n",
      "\n",
      "第2个问题：在机器学习的流程中，为什么要将数据集分为训练集和测试集？这样做有什么好处？\n",
      "第2个答案：模型：机器学习的一般流程如下：首先收集若干样本（假设此时有100个），然后将其分为训练样本（80个）和测试样本（20个），其中80个训练样本构成的集合称为“训练集”，20个测试样本构成的集合称为“测试集”，接着选用某个机器学习算法，让其在训练集上进行“学习”（或称为“训练”），然后产出得到“模型”（或称为“学习器”），最后用测试集来测试模型的效果。\n",
      "\n",
      "第3个问题：如何根据标记的取值类型来区分机器学习任务中的“分类”和“回归”任务？请提供原文中相应的定义和标记数值化的例子。\n",
      "第3个答案：当标记取值为离散型时，称此类任务为“分类”，例如学习西瓜是好瓜还是坏瓜、学习猫的图片是白猫还是黑猫等。当分类的类别只有两个时，称此类任务为“二分类”，通常称其中一个为“正类”，另一个为“反类”或“负类”；当分类的类别超过两个时，称此类任务为“多分类”。由于标记也属于样本的一部分，通常也需要参与运算，因此也需要将其数值化，例如对于二分类任务，通常将正类记为1，反类记为0，即Y={0,1}。这只是一般默认的做法，具体标记该如何数值化可根据具体机器学习算法进行相应地调整，例如第6章的支持向量机算法则采用的是Y={−1,+1}；当标记取值为连续型时，称此类任务为“回归”，例如学习预测西瓜的成熟度、学习预测未来的房价等。由于是连续型，因此标记的所有可能取值无法直接罗列，通常只有取值范围，回归任务的标记取值范围通常是整个实数域R，即Y=R。\n",
      "\n",
      "第4个问题：在机器学习中，什么是“泛化”能力？请根据原文解释为什么泛化能力是衡量一个模型好坏的关键，并给出相应的例子。\n",
      "第4个答案：泛化：由于机器学习的目标是根据已知来对未知做出尽可能准确的判断，因此对未知事物判断的准确与否才是衡量一个模型好坏的关键，我们称此为“泛化”能力。例如学习西瓜好坏时，假设训练集中共有3个样本：{(x1=(青绿;蜷缩),y1=好瓜),(x2=(乌黑;蜷缩),y2=好瓜),(x3=(浅白;蜷缩),y3=好瓜)}，同时假设判断西瓜好坏的真相是“只要根蒂蜷缩就是好瓜”，如果应用算法A在此训练集上训练得到模型fa(x)，模型a学到的规律是“色泽等于青绿、乌黑或者浅白时，同时根蒂蜷缩即为好瓜，否则便是坏瓜”，再应用算法B在此训练集上训练得到模型fb(x)，模型fb(x)学到的规律是“只要根蒂蜷缩就是好瓜”，因此对于一个未见过的西瓜样本x=(金黄;蜷缩)来说，模型fa(x)给出的预测结果为“坏瓜”，模型fb(x)给出的预测结果为“好瓜”，此时我们称模型fb(x)的泛化能力优于模型fa(x)。通过以上举例可知，尽管模型fa(x)和模型fb(x)对训练集学得一样好，即两个模型对训练集中每个样本的判断都对，但是其所学到的规律是不同的。导致此现象最直接的原因是算法的不同，但是算法通常是有限的，可穷举的，尤其是在特定任务场景下可使用的算法更是有限，因此，数据便是导致此现象的另一重要原因，这也就是机器学习领域常说的“数据决定模型的上限，而算法则是让模型无限逼近上限”。\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i in range(len(qa_pairs.qa_pairs)):\n",
    "    print('第{}个问题：{}'.format(i + 1, qa_pairs.qa_pairs[i]['query']))\n",
    "    print('第{}个答案：{}'.format(i + 1, qa_pairs.qa_pairs[i]['answer']), end='\\n\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "生成的数据可以使用`save_json`来保存为json格式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "qa_pairs.save_json(\"train_dataset.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "保存的数据同样可以调用`from_json`方法读取。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from generate_qa_pairs import QaPairs\n",
    "\n",
    "qa_pairs = QaPairs.from_json('train_dataset.json')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以轻松将生成的qa_pairs转为DataLoader格式，来微调向量模型微调。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将单个qa对转为InputExample并存入列表\n",
    "examples = [InputExample(texts=[qa_pair['query'], qa_pair['answer']]) for qa_pair in qa_pairs.qa_pairs]\n",
    "# 将数据集转换为DataLoader形式\n",
    "train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 问答对生成成功率会受文本内容与num_questions_per_page影响，实测内容为南瓜书、模型为glm-4、num_questions_per_page为1时正确生成给定格式问答对的概率为92.44%，当num_questions_per_page为2时正确生成给定格式问答对的概率降为22.86%。\n",
    "\n",
    "> 除此之外还有自定义方法`list_generate_qa_pairs`，它与`docs_generate_qa_pairs`除了接受参数格式为`List[str]`之外效果跟`docs_generate_qa_pairs`完全一样，因此也支持从其他方法读取的数据生成问答对。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**参考：**\n",
    "1. [Hugging Face:Train and Fine-Tune Sentence Transformers Models](https://huggingface.co/blog/how-to-train-sentence-transformers)\n",
    "2. [Sentence Transformers > Losses](https://sbert.net/docs/package_reference/sentence_transformer/losses.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm_universe_2.x",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
